Skip to content

fix(backup): clean up staging/temp files on abnormal exit and tighten disk-space pre-check#482

Open
mrrobot47 wants to merge 3 commits into
EasyEngine:developfrom
mrrobot47:fix/backup-cleanup-and-robustness
Open

fix(backup): clean up staging/temp files on abnormal exit and tighten disk-space pre-check#482
mrrobot47 wants to merge 3 commits into
EasyEngine:developfrom
mrrobot47:fix/backup-cleanup-and-robustness

Conversation

@mrrobot47

Copy link
Copy Markdown
Member

Summary

Resource-cleanup, disk-estimate accuracy, and minor-correctness hardening for the backup/restore flow.

Changes

  • Staging-dir / partial-download cleanup on abnormal exit. A backup's half-built archive (EE_BACKUP_DIR/<site>) and a restore's downloaded archive were removed only on success; any error/OOM/Ctrl-C left them to fill the disk (and a stale partial download could be reused). A register_shutdown_function now purges the staging dir on abnormal exit, with the tracked path cleared after the success-path remove so it only fires on failure. In restore() the tracking is armed only after the per-site lock is acquired, so a lock-conflict/bad-id early-exit can't delete a dir a concurrent operation of the same site is writing into.
  • Temp query-file cleanup. query.sql / db_size_query.sql written into the live web root are removed inline; a shutdown handler is a safety net for an interrupt between write and remove. Both cleanup handlers are wrapped in try/catch so a failed unlink (Symfony's Filesystem::remove() throws) can never abort the other shutdown handlers (lock release, dash callback).
  • Disk-space estimate accuracy. The pre-check now sizes the full archived set (app/ + config/, not just app/htdocs), guards false === disk_free_space() and a missing app/, and warns (instead of silently counting 0) when a file or the DB size can't be read. Required space is ceil(total_source * 1.1) (DB counted once — it's already in the source total — plus 10% slack), avoiding both the old under-count and a DB double-count that would false-reject DB-heavy sites.
  • dir_size() robustness. Built with CATCH_GET_CHILD | LEAVES_ONLY so an untraversable subdirectory is skipped rather than throwing UnexpectedValueException and aborting the whole backup; a missing dir returns 0 (some archived dirs are optional).
  • cleanup_old_backups() off-by-one. Retention triggered at N+1; now triggers at N to match the array_slice that keeps N.
  • list_remote_backups() empty result. explode(PHP_EOL, trim('')) yields [''], making the "No remote backups found" branch unreachable; now filters blank lines.

Verification

php -l passes. Reviewed by two independent reviewers (correctness + design); both must-fixes they raised — the concurrent-restore staging-dir ordering (data-loss), the unguarded shutdown handlers, the DB double-count false-reject, and the dir_size abort — are incorporated and confirmed.

Merge note

Built on develop (file-existence per-site lock). The parallel lock-robustness PR (#480) converts that lock to flock; the new disk-check error paths here still call fs->remove(<site>.lock) per the current pattern, so a trivial reconciliation will be needed depending on merge order.

Resource-cleanup, disk-estimate accuracy and minor correctness fixes in
Site_Backup_Restore, all scoped to non-lock/non-archive concerns:

- Register shutdown handlers that purge the local staging dir
  (EE_BACKUP_DIR/<site_url>) and the leftover temp SQL query files on any
  abnormal exit. The success path clears the tracked state so the handlers
  no-op; previously an error/crash leaked the half-built archive (disk fill)
  or a stale partial download a later restore could reuse.
- pre_backup_check(): size the full set of dirs actually archived per site
  type (app/ + config/, not just app/htdocs) plus DB, add headroom for the
  transient uncompressed dump and growing archive, and fail up front when
  free space is undeterminable instead of treating it as 0.
- dir_size()/get_db_size(): warn instead of silently counting an
  unreadable/unparseable size as 0, so the estimate isn't quietly low.
- cleanup_old_backups(): drop the off-by-one (`> N + 1`) so retention
  triggers at exactly N, matching the array_slice that keeps N.
- list_remote_backups(): filter blank lines so the empty-listing
  ("No remote backups found") path is actually reachable.
Address review findings on the cleanup/robustness changes:

- restore(): set $staging_dir and register cleanup_staging_dir only AFTER
  pre_restore_check() has acquired this site's lock (mirroring backup()).
  Registering it earlier let an early exit (invalid backup id, or a lock held
  by another process) delete the staging dir a concurrent backup/restore of
  the same site was actively writing into -- data loss.
- cleanup_staging_dir()/cleanup_temp_query_files(): wrap each shutdown
  safety-net body in try/catch ( \Throwable ). Symfony Filesystem::remove()
  throws IOException on a failed unlink (e.g. permission denied under the
  www-data-owned web root); a throw in the first shutdown handler would abort
  the remaining handlers, leaking the global lock and skipping the EasyDash
  failure callback.
- pre_backup_check(): count the DB size once. $site_size already includes it,
  so multiplying ( $site_size + $db_size ) over-counted the DB ~2.2x and
  false-rejected DB-heavy sites; require ceil( $site_size * 1.1 ) instead.
- dir_size(): build the recursive iterator with CATCH_GET_CHILD (and
  LEAVES_ONLY) so an untraversable subdirectory is skipped instead of throwing
  UnexpectedValueException and aborting the whole backup.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the backup/restore helper by ensuring staging artifacts and temporary query files are cleaned up on abnormal exits, and by improving disk-space preflight estimation/robustness so failures don’t leave behind large partial state.

Changes:

  • Add shutdown-time cleanup for per-site staging directories and temporary SQL query files to prevent disk fill and stale partial artifacts after crashes/interrupts.
  • Tighten disk-space pre-check sizing (include app/ + config/, add 10% headroom, guard disk_free_space()), and make dir_size() resilient to missing/untraversable paths.
  • Fix minor correctness issues in remote backup listing (empty output) and retention logic (off-by-one).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/helper/Site_Backup_Restore.php Outdated
Comment on lines +1001 to +1002
$this->fs->remove( EE_BACKUP_DIR . '/' . $this->site_data['site_url'] . '.lock' );
EE::error( "Site app directory does not exist: $site_root/app" );

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e92c356 — per-site lock removals now go through a best-effort try_remove_site_lock() (try/catch), so a Filesystem::remove() failure on an error path can no longer mask the intended EE::error().

Comment thread src/helper/Site_Backup_Restore.php Outdated
Comment on lines +1036 to +1037
$this->fs->remove( EE_BACKUP_DIR . '/' . $this->site_data['site_url'] . '.lock' );
EE::error( 'Unable to determine free disk space for backup directory.' );

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e92c356 — per-site lock removals now go through a best-effort try_remove_site_lock() (try/catch), so a Filesystem::remove() failure on an error path can no longer mask the intended EE::error().

Comment thread src/helper/Site_Backup_Restore.php Outdated
Comment on lines +1009 to +1013
$db_size = 0;
$site_size = $this->dir_size( $site_root . '/app' );
$site_size += $this->dir_size( $site_root . '/config' );

EE::debug( 'Site size (files): ' . $site_size );

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e92c356 — inlined the single-use $db_size directly into $site_size.

Addresses code-review feedback: Filesystem::remove() throws on a failed unlink, so removing the lock on an error path could raise an uncaught exception and mask the intended EE::error(). Route all per-site lock removals through a best-effort try_remove_site_lock() helper. Also inline the single-use $db_size into $site_size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants